Boston is a cold city. The weather in Boston affects the city in various aspects. Among these, we are mainly looking for the relationship between weather and 311 reports.
To begin with, our datasets are from four sources: Boston Weather, 311 Dataset, Median Household Income and Boston population by neighborhood. We extracted the useful information from these datasets and combined them into the final dataset. Based on the dataset, we came up with several questions to investigate: How many snow-related 311 requests are there? What types of 311 request types happen in extreme weather? How do 311 reports vary by region and median household income, especially in extreme weather conditions?
By looking at the word “snow” that appeared in the 311 requests when there was snowfall or there wasn’t snowfall, the results are pretty similar to what we expected. The results show us how seriously the snowfall affects Bostonians. For the request types that occur in extreme weather, we lay emphasis on three typical weather types: snowy, windy and chilly in winter Boston. By generating the reports into the word clouds, we intend to see what types of problem that Bostonians encounter during the severe weather in winter. If we go deeper to look for a correlation between these three weather events with each region, we would know more specifically the demand for 311 services and troubles that people in each region encounter. Regions of Boston vary by income. We look through the relationship between income and the 311 reports residents of each region make in order to draw the conclusion of each type of request that wealthy or poor Bostonians make in snowy and chilly weather.
After sampling several weather datasets (and creating more temporary accounts and signing up for more free trials than we can count) we decided to use this dataset that was posted to kaggle.com (a data programming competition website). It suits our purposes because it provides day by day information on what the weather was like in Boston over the past several years.
We use the standard 311 dataset from data.boston.gov.
We use census data from boston.gov. This part we input manually, since there are not very many neighborhoods in Boston.
When we realized we need to know the population of each neighborhood in Boston, we pulled that information from here.
Before we can ask any questions of our data, we need to put it in a form condusive to analysis. We start by loading each dataset, and cleaning up those columns that will be used for merging with other datasets. We also do any computations that are most easily done pre-merging.
# All imports go in this block
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from re import sub
from decimal import Decimal
from wordcloud import WordCloud, STOPWORDS
from scipy import stats
# Load the weather dataset
df_weather = pd.read_csv("data/Boston weather_clean.csv")
For the purposes of answering questions about Snow in Boston, it would be helpful to know a few things about how likely it is that snow is around.
First of all, it would be helpful if each day was labelled by whether it was the first day of snowfall, or the third day of a long snow storm. For this reason, we add the consecutive_snow_days column.
Next, for those days on which no snow was reported, it would be nice to know if there had recently been snowfall, which would cause snow to still affect 311 requests on that day. For this reason we added the days_since_last_snow column to the dataset.
Lastly, in case we want to look at accumulated snow over several days, we keep track of an accumulated_snow column, which adds up the snowfall in inches for consecutive snow days.
# Compute snow-related metrics
consecutive_snow_days = [0]
days_since_last_snow = [365]
accumulated_snow = [0]
for index, row in df_weather.iterrows():
if row['Events'] == 'Snow' or row['Events'] == 'Both' or row['Snowfall (in)'] > 0:
if consecutive_snow_days[-1] == 0:
accumulated_snow.append(float(row['Snowfall (in)']))
else:
accumulated_snow.append(accumulated_snow[-1] + row['Snowfall (in)'])
consecutive_snow_days.append(consecutive_snow_days[-1] + 1)
days_since_last_snow.append(0)
else:
accumulated_snow.append(accumulated_snow[-1])
consecutive_snow_days.append(0)
days_since_last_snow.append(days_since_last_snow[-1] + 1)
df_weather['consecutive_snow_days'] = consecutive_snow_days[1:]
df_weather['days_since_last_snow'] = days_since_last_snow[1:]
df_weather['accumulated_snow'] = accumulated_snow[1:]
We downloaded the 311 dataset from data.boston.gov. In order to merge it with the weather dataset, we split the open date of each request into three columns (Year, Month, and Day). We chose the open date rather than the close date because we figured the weather most relevant to a request would be the weather at the time of opening, and not when it was closed.
# Load 311 dataset
df_311 = pd.read_csv('data/311.csv')
# Split out the date column into separate Year Month and Day columns
df_311['open_dt'] = pd.to_datetime(df_311['open_dt'])
df_311['Year'] = df_311['open_dt'].apply(lambda date: date.year)
df_311['Month'] = df_311['open_dt'].apply(lambda date: date.month)
df_311['Day'] = df_311['open_dt'].apply(lambda date: date.day)
# Merge the datasets. This performs an INNER JOIN which keeps only items
# that are present in both datasets.
df_311_weather = pd.merge(df_311, df_weather, on=['Year', 'Month', 'Day'])
We found the median household income for each region of Boston, and decided to include it in our dataset to try to see if we can find some interesting correlations between income and 311 requests.
df_income = pd.read_csv('data/median income.csv', delimiter=";")
def convert_neighborhood(neighborhood):
"""
Match neighborhoods in the 311 dataset with regions in the income dataset
"""
conversions = [("Allston / Brighton", "Allston/Brighton"),
("Allston", "Allston/Brighton"),
("Brighton", "Allston/Brighton"),
("Back Bay", "Back Bay/Beacon Hill"),
("Beacon Hill", "Back Bay/Beacon Hill"),
("Fenway / Kenmore / Audubon Circle / Longwood", "Fenway/Kenmore"),
("Greater Mattapan", "Mattapan"),
("Downtown / Financial District", "Boston"),
("Mission Hill", "West Roxbury"),
("South Boston / South Boston Waterfront", "South Boston"),
("Chestnut Hill", "Allston/Brighton")]
for left, right in conversions:
if neighborhood == left:
return right
return neighborhood
df_311_weather['neighborhood'] = df_311_weather['neighborhood'].apply(convert_neighborhood)
# Clean up dollar strings into numbers
df_income['median household income'] = df_income['median household income'].apply(lambda money: Decimal(sub(r'[^\d.]', '', str(money))))
df_income
df_311_weather_income = pd.merge(df_311_weather, df_income, left_on="neighborhood", right_on="region", how="left")
After merging all of our data sources into one dataset, we get a table in which each row is a 311 request, and the columns include information about that 311 request, what the weather was that day, and the median household income of the region in which the request was made. Below is a sample of this dataset, showing only the columns that are of interest to us. (Scroll right to see the new columns)
df = df_311_weather_income.filter(['Year', 'Month', 'Day', 'case_title', 'reason', 'type', 'neighborhood', 'High Temp (F)', 'Avg Temp (F)', 'Low Temp (F)', 'High Wind (mph)', 'Avg Wind (mph)', 'High Wind Gust (mph)', 'Snowfall (in)', 'Precip (in)', 'Events', 'consecutive_snow_days', 'days_since_last_snow', 'accumulated_snow', 'median household income', 'population'])
df
# Make sure plt is in a clean state
plt.rcdefaults()
# changes the type of the values under the column “case_title” into a string
df["case_title"] = df["case_title"].astype(str)
# function that generates graphs according to which dataset has been passed in
def graph_snow_requests_vs_actual_requests(df):
# lists all the data in which there was snowfall
# by finding all the values under the “snowfall (in)” column that
# has a value greater than 0.
snowed = df["Snowfall (in)"] > 0
snow_data = df[snowed]
# filters dataset to give data in which the case title of the 311 report
# includes the string “snow”
has_snow = df["case_title"].str.contains("Snow")
reason_snow = df[has_snow]
# lists all the data in which there wasn't snowfall
# by finding all the values under the “snowfall (in)” column that
# has a value of exactly 0.
no_snow = df["Snowfall (in)"] == 0
# code to get dataset in which there wasn't snow but still had report(s) on snow
# by filtering dataset by the conditions in which it didn’t snow, yet has a
# 311 report on that day in which the report title had the word “snow” in it.
no_snow_data = df[no_snow & has_snow]
# counts the number of instances in which there was no snow, yet there was
# at least one 311 report that had the word “snow” in its title
count_no = no_snow_data['case_title'].count()
# data has been filtered to show only those in which there was snow, and also
# 311 reports with the word “snow” in its title
yes_snow_data = df[snowed & has_snow]
# counts the number of instances in which there was snow, and also
# at least one 311 report that had the word “snow” in its title
count_yes = yes_snow_data['case_title'].count()
# code gets all the data in which the case title of the 311 report does
# NOT have the word “snow” in it
no_word = df["case_title"].str.contains("Snow") == False
reason_nosnow = df[no_word]
# filters dataset to get all the data in which there was snowfall, but no 311
# report with the case title having the word “snow” in it
yesSnow_noReport = df[snowed & no_word]
# counts the number of instances in which there was snowfall, but no
# 311 report that had the word “snow” in its case title
count_noSnowReport = yesSnow_noReport['case_title'].count()
# filters dataset to get all the data in which there was no snowfall,and no 311
# reports with the case title having the word “snow” in it
noSnow_noReport = df[no_snow & no_word]
# counts the number of instances in which there was no snowfall, and also no
# 311 reports with the word “snow” in its case title
count_noSnownoReport = noSnow_noReport['case_title'].count()
# First graph
# Code to draw a bar graph where we can see the number of 311 reports that
# either has the word “snow” in its case title or not when there was snowfall.
# The values of the bars are from the count of instances in which there was
# snowfall but no 311 reports that had the word “snow” in the case title, and
# another from when there was snowfall and also had 311 reports that had a case
# title with the word “snow” in it.
x_pos = ('"snow" exists', '"snow" doesn\'t exist')
y_pos = np.arange(len(x_pos))
performance = [count_yes,count_noSnowReport]
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, x_pos)
plt.ylabel('Number of 311 requests')
plt.xlabel('Whether the word "snow" exists within the 311 report title')
plt.title('Having the word "snow" in 311 requests when there WAS snowfall')
plt.show()
# Second graph
# Code to draw the bar graph in which we can see the number of 311 reports that
# either has the word “snow” in its case title or not when there was NO snowfall.
# The values of the bars are from the count of instances in which there was no
# snowfall, and also no 311 reports that has the word “snow” in its case title, and
# another count from when there was no snowfall, yet there were 311 reports in which
# the case title had the word “snow” in its case title
x_pos = ('"snow" exists', '"snow" doesn\'t exist')
y_pos = np.arange(len(x_pos))
performance = [count_no,count_noSnownoReport]
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, x_pos)
plt.ylabel('Number of 311 requests')
plt.xlabel('Whether the word "snow" exists within the 311 report title')
plt.title('Having the word "snow" in 311 requests when there WAS NO snowfall')
plt.show()
graph_snow_requests_vs_actual_requests(df)
Note: This dataset is from 2011 to 2018, inclusive of all seasons
There are more reports in which the word “snow” doesn’t exist within the case title even though there is snow, and there are many reasons as to why that might be the case. First of all, “snowfall” in our dataset is anything where there is more than 0 inches of snow, and when there is only an inch or so of snow, it doesn’t really negatively impact the community so much. Also, not all snow-related reports include the word “snow” in the title. It can be something like “frozen road”, or “requests for street cleaning” and so forth, which means those reports will be omitted from our count because we are only looking at data in which the word “snow” is explicitly included in the title.
Looking at the graph, there are more reports in which the word “snow” doesn’t exist within the case title when there is no snow. The most obvious reason as to why there are more reports explicitly unrelated to snow is because this dataset includes all seasons from 2011 to 2018. Boston gets pretty snowy during the winter months, but it is rare to have snow from spring to fall, and because our dataset includes all seasons, obviously, we will have many more reports that are unrelated to snowfall.
winter_2014_2015 = (df['Year'] == 2014) & (df['Month'] > 8) | ((df['Year'] == 2015) & (df['Month'] < 5))
graph_snow_requests_vs_actual_requests(df[winter_2014_2015])
Note: This dataset is from 2014 to 2015, only including the winter months
Unlike the bar graph above with the same question, with this dataset that only includes the winter months, we now have more 311 requests with the word “snow” in the case title. We explicitly picked the winter of 2014 to 2015 because those were the years in which there was a lot of snowfall, and probably because of that reason, we subsequently also had a lot more 311 requests relating to snow. As stated above, this count omits requests that do not explicitly have the word “snow” in the case title, but even then, we still have more reports with the word “snow” within the title. This fact tells us that the winter of 2014-2015 was indeed quite severe in terms of the degree of how much the weather negatively affected the residents of Boston.
Similar to the bar graph above with the same question, the graph of the dataset that only includes the winter months also has more 311 requests where the word “snow” doesn’t exist in the case title when there was no snowfall. The reason why is probably due to the fact that because there was no snow for the day, there were also less problems, or at least less severe problems related to snow. We still do have reports explicitly relating to snowfall, and those are probably due to the remnants of snow from the snowy weather the day before or so.
Next, we investigate a few types of extreme weather, and how these types of weather may affect the kinds of 311 requests that come in. These are windy, snowy, chilly. It is interesting to see which sorts of requests occur more for certain types of weather than others.
One visually effective way to observe trends in word choice is by generating word clouds. Below, we define which requests fit our criteria for each weather type, and then we generate wordclouds from the request text for those requests.
In order to help define a "snowy" day, we use the days_since_last_snow metric we came up with in the datasets section of this project. The reason for this is we do want to include the day after a snowfall because we expect snow removal requests to continue into the next day. While many snow requests happen exactly on the day that it snowed, some happen in the next days as demonstrated in the calculation below.
# What is the average days_since_last_snow for snow removal requests
# 1. filter the table for only entries about snow removal
plow_requests = df['case_title'].str.contains("Snow", na=False)
# 2. for those entries, compute the average of days_since_last_snow
print(df[plow_requests & winter_2014_2015]['days_since_last_snow'].mean())
print(df[plow_requests]['days_since_last_snow'].mean())
## WINDY ##
windy = df['Avg Wind (mph)'] > 30
## SNOWY ##
snowy = df['days_since_last_snow'] < 2
## CHILLY ##
celcius = df['Low Temp (F)'].apply(lambda x: (x - 32) * (5.0/9.0))
chilly = celcius < (-19)
def make_wordcloud_from_text(text):
wordcloud = WordCloud(
width = 3000,
height = 2000,
background_color = 'white',
stopwords = STOPWORDS).generate(text)
fig = plt.figure(
figsize = (20, 15),
facecolor = 'white',
edgecolor = 'white')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
def make_wordcloud(df):
text_cols = ['reason', 'type']
make_wordcloud_from_text(" ".join(df[col_name].str.cat(sep=" ") for col_name in text_cols))
The first type of extreme weather that we showcase is the snowy case. We define this to be days on which it snowed, as well as the two days immediately following a snow day. The biggest words are Snow Plowing Request, which fits our expectations. There are also requests to repair potholes, and reports of missed trash pickups. There are many requests for Scheduled Bulk Item Pickup, which baffles us a little bit - perhaps these are reports of missed pickups.
An interesting snow-related request is "Parking Enforcement" - it would be interesting to know how many of these are people who had marked their spot with lawn chairs. More likely, since snow makes many previous parking spaces unviable, people are more likely to park illegally and thus trigger more parking enforcement-related 311 calls.
make_wordcloud(df[snowy])
In the wordcloud below, observe the words that come up most often for windy days (days with an average windspeed of 30 mph or higher). Tree related requests, as well as Downed Wire reports are very common. Street lights, traffic lights, signal repair, all seem to show that Boston's traffic flow control infrastructure is vulnerable to high winds. There are also many requests for street cleaning
make_wordcloud(df[windy])
Snow is only one of the challenges Bostonians face in the winter. The other one is just sheer cold. What kinds of calls do Bostonians make in subfreezing temperatures?
The biggest words by far come from the request "Heat Excessive Insufficient" which seems like a catch-all type for heating related issues.
make_wordcloud(df[chilly])
Our first definition of chilly looked only at temperatures below negative 20 degrees Celcius. This turned out to be too aggressive a cutoff, since it left us with only one day on the record: February 14th, 2016. On this day, there were several requests for "Heat Excessive Insufficient" from Charlestown. While it made a nice (and different-looking) wordcloud, we soon realized that this was merely due to a small amount of data. By moving the threshold up to negative 19 degrees Celcius, it included a couple more years worth of information (but still in the teens of February, which seems to be the coldest time in Boston).
Here, we correlate the frequency of 311 reports with each region in Boston by counting the occurrences of each neighborhood in the "neighborhood" column of the 311 dataset.
# Data manipulation: get counts of reports by neighborhood
def plot_value_counts(value_counts, tag=""):
x_pos = np.arange(len(value_counts.keys()))
plt.bar(x_pos, list(value_counts[key] for key in value_counts.keys()), align='center',
color='green', ecolor='black')
plt.xticks(x_pos, value_counts.keys(), rotation='vertical')
plt.xlabel("Neighborhood")
plt.ylabel("Number of 311 Reports Divided By Population")
plt.title("Region Correlated With Frequency of 311 Reports" + ("" if len(tag) == 0 else ": {}".format(tag)))
plt.show()
def plot_reports_by_region(df, tag=""):
new_df = df.groupby(by="neighborhood").aggregate({'case_title': 'count',
'population': 'max'})
plot_value_counts((new_df['case_title'] / new_df['population']).sort_values(ascending=False), tag=tag)
plot_reports_by_region(df)
The bar graph above is normalized by population in thousands, and shows the frequency of 311 requests made per region in Boston. The first time we generated the graph, we did not normalize the data, so a populated region like Dorchester had the highest number of frequency of 311 reports made. Without the normalization of data, there is little value as to what we can analyze from it, so we went ahead and divided the frequency of the reports with the population of the neighborhood.
Now looking at the normalized data, we see that Boston has overall the most number of 311 reports made. The population data we used for Boston isn’t very accurate however (we couldn’t find the exact number, so we just approximated it by looking at population data from other regions), therefore we will take Boston’s values with a grain of salt. Coming up next is West Roxbury with the second highest number of reports made overall, and the lowest number of reports made is by Fenway/Kenmore. With the normalization of data, Dorchester is no longer the neighborhood that produces the most number of 311 requests.
# Three different bar graphs showing the frequency of 311 reports in each region of Boston
# by three different extreme weather conditions: snowy, windy, and chilly.
plot_reports_by_region(df[snowy], tag="Snowy")
plot_reports_by_region(df[windy], tag="Windy")
plot_reports_by_region(df[chilly], tag="Chilly")
We correlated different types of 311 reports with frequency of requests and region in Boston. West Roxbury makes the most requests related to snowy and windy weather, while Charlestown makes the most number of requests concerning freezing temperatures. Fenway/Kenmore remains to be the region that makes the least number of 311 reports across all conditions. We wondered if there was a reason why Fenway/Kenmore had the lowest number of 311 requests made, so we took a look at its income data, and found out that its income level was the second-lowest within all the regions of Boston. Also according to Google, Fenway/Kenmore is known to be quite poverty-prone. Now being curious about how income is related to frequency of 311 requests made, we made the scatterplots below.
After looking at which regions have the most 311 requests, we looked at the median household income of each of those regions to try to find a correlation between median household income and frequency of 311 requests.
def plot_requests_by_income(df, tag=""):
# Data manipulation: get counts of reports by region
new_df = df.groupby('median household income').aggregate({'neighborhood': 'min', 'case_title': 'count', 'population': 'max'})
value_counts = new_df['case_title'] / new_df['population']
fig, ax = plt.subplots()
x = list(map(lambda x: int(x), value_counts.keys()))
y = list(value_counts[key] for key in value_counts.keys())
# Plot how many of our reports come from low vs high income areas
#ax.scatter(x, y)
plt.xlabel("Income ($)")
plt.ylabel("Number of 311 Requests Divided By Population")
plt.title("Income Correlated With Number of 311 Reports" + ("" if len(tag) == 0 else ":\n{}".format(tag)))
fit = np.polyfit(x, y, 1)
fit_fn = np.poly1d(fit)
plt.plot(x,y, 'yo', x, fit_fn(x), '--k')
for i, txt in enumerate(value_counts.keys()):
ax.annotate(df[df['median household income'] == txt]['neighborhood'].min(), (x[i], y[i]))
print("rvalue:", stats.linregress(x, y).rvalue)
plt.show()
plot_requests_by_income(df)
The scatterplot above has been normalized by dividing the frequency of 311 requests made with the population of region. We do in fact, see a trend where the lower income regions have lower number of 311 requests made, while the higher income regions have more 311 requests made. The calculated r-value is 0.55, which means the correlation is not by chance, but at the same time, the correlation isn’t quite concrete either. It’s barely over chance actually. When we first started this project, we hypothesized that the poorer regions would make more 311 requests due to less adequate living conditions, however, the data argues otherwise; richer regions are more verbose in filing 311 requests. We thought of several reasons why. Perhaps people from more poverty-prone regions are too busy with their everyday survival to care too much about the surrounding environment. Maybe these people do not have access to the internet or phone. It is also possible that richer people are just more picky about their living conditions and how well maintained their neighborhood is. We then took a look into the relationship between types of 311 requests and income per region.
plot_requests_by_income(df[df['reason'] == 'Street Cleaning'])
Diving deeper into details, let's look at a specific reasoning for filing 311 requests; in particular, those related to street cleaning. This scatterplot looks nearly identical to the general scatterplot above, however, the r-value has decreased a little to a mere 0.46, so the correlation is now at chance level.
plot_requests_by_income(df[snowy])
We will now look only into 311 requests relating to snowy weather. The r value is 0.59, so the correlation is weak, yet above chance. The scatterplot shows basically the same trend as those of the above, where regions with less income have lower numbers of 311 requests made. Some major complaints during weather with heavy snowfall are those relating to snow shoveling, abandoned vehicles, and parking enforcements. For one thing, we can see how poorer regions may have less problems with vehicles, given that it's usually cheaper to use public transportation than to actually own a car and take care of it. If my assumption is correct and regions with lower income do actually have less people who own cars, then it makes a little more sense as to why there would at least be a lower number of reports about abandoned vehicles and parking enforcements.
plot_requests_by_income(df[chilly][df[chilly]['reason'] == 'Housing'], tag="Housing-Related Requests on Chilly Days")
This scatterplot looks specifically at the number of 311 requests made that are related to housing during freezing temperatures. Here, we can see that for once, we have more reports coming from lower-income regions. This trend may be due to worse living/housing conditions of those who are more in poverty. As one would expect someone living in a wealther neighborhood to have better heating, ventilation, and studier housing in general, these people may suffer from less setbacks due to cold than those who live in housing conditions that are not quite as decent. However, we will have to keep in mind that the r-value is only a -0.35, so the correlation is rather weak.
Our most general question of this project was to see the relationship between 311 requests and the weather in Boston. We then made the question a little more specific by looking into the frequency and types of 311 requests made in correlation with weather and region in Boston.
The first section of our Jupyter Notebook looks into how the presence of snowfall is related to the types of 311 requests made. We counted the number of requests that were explicitly related or unrelated to snow in correlation to whether there was actually snowfall or not. Using two different datasets (the first dataset that includes all seasons from 2011 to 2018 and another dataset that only includes the winter of 2014 to 2015), we were able to see how the severity of snowfall affected what what types of 311 requests were made. With heavy snowfall came more 311 requests strongly related to snow, such as requests for “snow plowing” and “street cleaning”. We can see this relationship in the word clouds made in the next section of the notebook, where we were able to see the different types of 311 requests made under various kinds of extreme weather.
Taking a look at the word cloud titled “Snowy”, we can see which 311 requests are most frequently requested when there are more than two inches of snowfall. Unsurprising, most requests asked for snow plowing, maintenance of streets and highways, clearance of abandoned vehicles, and problems with heat. Snowy weather seemed to have a link with chilly weather of under negative 20 degrees celsius, where if you look at the word cloud for “Chilly”, you can see some of the requests that were also under “Snowy”, including requests for “street cleaning/snow plowing”, maintenance of highways/streets/sidewalks, “abandoned vehicles”, and various heat problems. One more weather condition related to the cold is windiness, and looking into the word cloud under the title “Windy”, we see lots of problems with some kind of “tree emergencies”, “street cleaning”, and problems with lights, wires, signs, and traffic signals.
Overall, with extreme weather relating to cold (snow, wind, and under freezing temperatures), Boston has problems relating to the maintenance of streets and highways, inadequate heating conditions, and abundance of abandoned vehicles.
We then looked into which regions of Boston that generate the most number of 311 requests. Boston had the most number of 311 reports made overall. Next up came West Roxbury with the second highest number of reports made, and Fenway/Kenmore had the lowest number of reports made. In fact, for the most part, Fenway/Kenmore consistently had the lowest number of 311 requests made despite which type of 311 reports we were looking at, which led us to look into income per region to see if there was a correlation with income and frequency of 311 requests.
We drew scatterplots to see the relationship with income and number of 311 requests, and all the scatterplots had weak correlations, although most scatterplots still had an above-chance-correlation where the less the median income of the region is, the less the number of 311 requests for that region.
The main takeaway from this project is the fact that both weather patterns and income have some correlation with the types of 311 requests received. This is altogether unsurprising; however, given what we found, the City of Boston can continue to prepare itself to handle tree issues when it's windy and build-up of snow on streets when it's snowy.
All three of us have evenly contributed to finding datasets to use. Wanqi translated the online data of income by region into a CSV format, and Matthew worked on combining the datasets together to form the ultimate CSV file we used for this assignment. Most of the coding and visualizations were done by Matthew, including adding more necessary columns onto the dataset, creating graphs, and generating word clouds. Jin took care of most of the analysis of data, including commenting on code to help audiences better understand what the code is for, and writing interpretations on the what can be observed from the visualizations. Wanqi wrote the introduction for the assignment, and Jin wrote the conclusion.